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Abstract -The theory of community structure is a powerful tool for real networks, which can 
simplify their topological and functional analysis considerably. However, since community detec¬ 
tion methods have random factors and real social networks obtained from complex systems always 
contain error edges, evaluating the robustness of community structure is an urgent and important 
task. In this letter, we employ the critical threshold of resolution parameter in Hamiltonian func¬ 
tion, 7c, to measure the robustness of a network. According to spectral theory, a rigorous proof 
shows that the index we proposed is inversely proportional to robustness of community structure. 
Furthermore, by utilizing the co-evolution model, we provides a new efficient method for comput¬ 
ing the value of 7 c- The research can be applied to broad clustering problems in network analysis 
and data mining due to its solid mathematical basis and experimental effects. 


> 

(N 


O 

OO 

o 

CO 

o 

wo 



1. Introduction. — Community structure detection 
m- m is a hotspot of social network studies. It has at¬ 
tracted much attention from various scientific fields. Gen¬ 
erally, community refers to a group of nodes in the network 
that are more densely connected internally than the rest of 
the network. A well known exploration for this problem is 
the concept of modularity, which is proposed by Newman 
et al. m m to quantify a network’s partition. Optimizing 
modularity is effective for community structure detection 
and has been widely used in many real networks [6]. How¬ 
ever, as pointed out by Fortunato et al. [3], modularity 
is restricted by the resolution limit problem which is con¬ 
cerned about the reliability of the communities detected 
through the optimization methods. Complementary to the 
modularity concept, many efforts are devoted to under¬ 
standing the properties of dynamic processes taking place 
in the underlying networks. Specifically, researchers have 
begun to investigate the correlation between community 
structure and dynamic systems, such as synchronization 
[4] and random walk process 0 ■ 

In the real-world, network topology changes over time. 
The analysis of community structure in evolving networks 
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has been regarded as a “Holy Grail” of network scien¬ 
tists. A famous example is the karate club network con¬ 
structed by Wayne Zachary in 1970s [7]. During the course 
of his study, a dispute arose between the club’s adminis¬ 
trator and principal karate teacher over whether to raise 
club fees, and the club eventually split into two smaller 
clubs, centered around the administrator(node 1) and the 
teacher(node 33), as shown in Fig 1(a) It can be assumed 
that, the relationships between members in karate club 
at the very beginning are not robust and small pertur¬ 
bation may cause the complete change of the topology. 
Why the initial un-tight relationship evolves into or out 
of community structure is a very interesting question m, 
since community structure has a great impact on human 
organizational structure, rumor and epidemic spreading, 
network attack effect and congestion control. 


Given a network, it is meaningless to detect the commu¬ 
nity when the community structure is un-robust: if a small 
change in the network, for example, an edge added here 
or there, can completely change the outcome (significance 
or stability) of community structure, then, we argue that, 
the network is un-robust and the result could not be trust¬ 
worthy. In this letter we focus on this imperative task 
and prove the critical threshold of resolution parameter in 
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Fig. 1 : (a) The community structure of the karate club network 
detected by Wayne Zachary; (b) A network can be partitioned 
into more and more communities when 7 increases; (c) A strong 
4 -community structure in network A and a weak 4 -community 
structure in network B. 


Hamiltonian function, 7c, can measure the robustness of 
a network. 

For any given network, robustness information can be 
derived from 7c directly and conveniently without using 
particular partition algorithms. A rigorous proof is then 
used to show the index we proposed is inversely propor¬ 
tional to robustness of community structure theoretically 
based on spectral theory. Furthermore, to calculate the 
value of 7 c, a new efficient method is provided based on 
co-evolution model. The method can be applied to broad 
clustering problems in network analysis and data mining 
due to its solid mathematical basis and efficiency. 

2. Potts model and the ciritcal resolution pa¬ 
rameter 7. — Potts model [ 5 ] is a powerful thermody¬ 
namic method, which has been widely applied to uncover 
community structure in networks. We use the multiresolu¬ 
tion Potts method to carry out the study. Given a network 
G and corresponding adjacency matrix A = {A^}, com¬ 
munity structure can be determined by minimizing the 
infinite IV-state Hamiltonian function: 


H (7) = = y- 7 Pij)S Ci ,Cj, (1) 

iAi iAj 

where Ci represents the community (state) that node 
(spin) i belongs, 7 is the resolution parameter, Pij repre¬ 
sents the expected number of edges between nodes i and j 
in the null model, and J( 7) is the coupling matrix with en¬ 
tries Jij represents the interaction strength between node 
i and j. 

In Potts model, the resolution parameter 7 is an impor¬ 
tant indicator of dynamics of community structures. By 
tuning the value of 7, we can detect community structure 
at multiple scales. Specifically, when the value 7 increases, 
a network can be divided into more smaller communities, 
as shown in Fig |l(b)| If we define 7 c as the minimal 7 
value for dividing network into C communities, then 7 c 
can be naturally used to indicate the stability or signifi¬ 
cance of C-community structure. For example, Fig 1 (c) 
shows a strong 4 -community structure in network A and a 
weak 4 -community structure in network B. It can be eas¬ 
ily estimated that 74(A) < 74(H). Based on the analysis 
above,the following theorem can be obtained: 

Theorem 1. If Hamiltonian function with 7 c divide 
network G into C communities, this result is the weakest 
one which just meets the definition El 0, i.e. the number 
of intra-community edges is equal to the number of inter¬ 
community edges. 

The proof of theorem 1 is explicit. In addition, the 
profiles of networks with different scales and types of con¬ 
nectivity can be compared using 7<7. These differences are 
defined as “network distances”. For example, Hamiltonian 
function containing 7 c is used to measure the distance be¬ 
tween network m and n: 

d mn = Y. k a = 1 \H{ 1 ™)-H{ 1 n a )\, ( 2 ) 

where H( 7™) is the Hamiltonian function with parameter 
7™ in network m. “Network distance” in this form can be 
applied without considering the differences of connectiv¬ 
ity between various networks, such as size, type of degree 
distribution and sparsity, and it is convenient to analyze 
the information hidden behind the topology. However, es¬ 
timating the value of 7 c is a tough job, which can only be 
tested by optimization methods up to now. In this study, 
we can use 7 c to directly quantify the robustness of a 
given network. Since few studies have shown the dynamic 
changes of 7, we focus on this novel issue and reveal the 
relationship between 7 c and network’s robustness in the 
next section. 

3. The relationship between 7 and the robust¬ 
ness. — In this section, a typical case is studied to prove 
7 c is able to quantify the robustness directly. For an 
undirected and unweighted graph G with N nodes and 
L edges, the topology is characterized by an associated 
adjacent matrix A = (A,7}. C communities are parti¬ 
tioned, and each community is labeled by r(r = 1,..., C ). 
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We denote the number of inner links connecting each pair 
of members inside community as l\ n , and the number of 
inter-community links as l r out , i.e. the number of links con¬ 
necting a member of any one community to a member of 
another community. Based on the mentioned notations, 
there is L = Y? r =i l in + \ £?=i l out- 

Next, the hyper-graph G* associated to network G is 
defined as the weighted directed C-clique in which each 
node corresponds to one community in G. In G* , the con¬ 
nection linking node r to node s is weighted by l -fr L , where 
l™ f represents the number of links of G that connect mem¬ 
bers of the community r with members of the community 
s, l\ n represents the number of inner links in the source 
community r. The corresponding C x C Laplacian matrix 
F = {Frs} is asymmetric, but can be written as a product 
F = A 0 , where A = {A rs } is a symmetric zero row-sum 
matrix with off-diagonal elements A rs = — and diag¬ 
onal ones A rr = £r# s Ct and 0 = diag{l/l } n ,..., 1 /lf n }. 
Then F is 


In addition, since A = 7 —I, according to the matrix iden- 
tity, there is A(S = l -f —. Integrating A(S and 7 with the 
function £ is derived as follows: 


£ = 7(1 


^out 

2/7 


C’L-Uo 


( 6 ) 


To study the dynamic characteristics of networks [ 9 ], a 
particular protocol is adopted by increasing l out at each 
step from 0 to 2 L (1 — ^), the value at which £ and H are 
zero. Given a fully modularized configuration (in which 
lout = 0 and li n = L), we conduct successive rewiring 
processes, i.e. in each step an intra-community link cor¬ 
responding to each community is deleted, and C inter¬ 
community links are formed by connecting those pairs of 
nodes (each one in different communities) which lost their 
intra-link. In this way, at the j -th rewiring, there are 
li n = L — Cj and l ou t = 2 Cj. Accordingly, the par¬ 
tial derivative of £, and the dynamical change of inter¬ 
community edges l ou t are calculated as follows: 


F = 0 A = 
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( 3 ) 


The spectrum of F is non-negative real values, as F is zero 
row-sum. Then the smallest eigenvalue of F, A 7 , is zero, 
while the second smallest one A£ > 0 . The method we 
proposed to measure the dynamic quality of community 
structure is defined as follows: 


£ = 7#a;(f). (4) 

In fact, H is an inherent evaluation function to compute 
the significance of community structure, based on spec¬ 
tral theory j 5 ]. AJ is able to quantify the connectivity 
of the hyper-graph, and therefore measure the extent to 
which different communities are bounded and interacted. 
It should be noticed that both H and A?; are properly nor¬ 
malized, so that even if the network links were associated 
to cohesive forces, the two quantities would be one dimen¬ 
sional. The maximum of £ corresponds to a topology in 
which the community structure is most significant, and 
thus crucial to this study. 

Let us then consider the case that C communities are 
all cliques with equal size N c = N/C. The number 
of intra-community links l\ n , as well as the number of 
inter-community links l r out are the same for all commu¬ 
nities. Then, as V in = l in /C and l r out = l ou t/C , L = 
CQin + \l r out) = lin + \lout- Under these assumptions, 
Hamiltonian function of Eq. m can be simplified to the 
following expression: 


H = C[iM- 7 (fT )2] = ( 1 -^^ ) . (5) 


_ 7U Q __ 7 _ _ lout lout /y\ 

dlout {L-\l out y y C L 4 L 2 ’ 

When = 0 , = 2 L (1 — The function of £ 

reaches the maximum since the second derivative of l ou t 
is indeed negative. According to the formation of the 
maximum of inter-community edges when 7 = C, 

lout x = 0- I n this case, no inter-community edges ex¬ 
ist and the original network cannot be perturbed any- 
more(increase l^ x will decrease £). On the contrary, 
when 7 = 0 , there is l^ x = 2 L. At this time, the network 
is indeed perturbable because increasing will also in¬ 
crease £ until all edges are inter-community edges. In this 
situation, one node is a single community and only belongs 
to itself. In a special intermediate case, when 7 = A, there 
is l™ut x = L, and intra-community edges have the same 
number of edges with inter-community ones. According 
to the definition m m, this is just the threshold testing 
whether the community structure emerges. As explained 
above, this 7 is just the critical value 7 c, and 7 c is closely 
related to robustness: if the value of 7 c increases, the size 
of imperturbable area(un-robust) is also increases accord¬ 
ingly, as shown in Figj 2 ] This conclusion can be reflected 
in the following theorem: 

Theorem 2. The larger 7c, the lower robustness of a 
given network including C communities, and vise versa. 

For any given network, robustness information can be 
derived by 7 c conveniently without particular algorithms, 
since 7 c is only determined by network’s topology. 

4. A novel method to calculate the critical 7. 

— As proved by theorem 2, 7 c can be used to quantify a 
network’s robustness. However, the calculation of 7 c is a 
tough task and few studies have addressed this problem. 
Fortunately, theorem 1 provides a feasible way to the cal¬ 
culation, i.e. 7 c can be got through calculating 7 in a C 
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Fig. 2: The change of £ with intra-community edges h n . £ 
reach the maximal value at T^ x . At the same time, the com¬ 
munity structure is most significant. 


community structure when li n = l out . In this part, a two- 
stage method is proposed to calculate 7 c- The detailed 
procedures are described as follows. 

4-1- The relationship between 7 and community edge 
density. In unweighted graphs, the “inverse adjacency 
matrix” P is defined as Py = 1 if there is no edge between 
node i and j preset, and Py = 0, vise versa. In general, 
we set Py = 1 — Aij. Since the inner sum of Ay, i.e. 

Aijdci.Cj j is the number of edges in community r, 
and the sum of Py = 1 - Ay, i.e. 1 ~ ^ij)SCi,c } , is 

the number of missing edges in community r, Eq.© can 
be rewritten as 

H = Y J {lln-l{e r -l 7 in)\ ( 8 ) 

r 

where the inner sum of Ay has been rewritten in terms 
of the number of existing edges l\ n , e r is maximal number 
of edges that community r would have and the number 
of missing edges is e r — ll n . With minor rearrangement, 
Eq. (JTJ) can be transformed in an “edge density” form: 


P = ^(e”(p”-7(1-P r ))), 


( 9 ) 


where the edge density p r of community r is defined as 

P r = l l r- 

If the energy of a given community is attractive and have 
a binding pattern, the term p r — 7(1 — p r ) must be positive. 
Rearranging Eq. m provides a relationship between the 
resolution parameter 7 and the critical (minimum) edge 
density p*: 


P > 


7 


1 + 7 


= P 


( 10 ) 


Based on Eq. m, two important inferences can be ob¬ 
tained: 


P = 


7 


and 


7 = 


I + 7’ 
P* 

1 ~P*‘ 


( 11 ) 

( 12 ) 


4-2. The co-evolution model. We set p* = a, and 
substitute 7 = into Hamiltonian function of Eq. ©: 

H = (Ay — —— Pij)Sci,Cj- (13) 

Then, the term 1 — a is extracted and an equivalent func¬ 
tion is derived 

H * = ]T ((1 - a)Ay - aP Z] )Sc uCj . (14) 
j 

In Ea. lfTH) . we consider a as a special probability, and the 
value of a lies between 0 and 1. To optimize Eq. m, 
the following steps are needed: at each step, a vertex i 
is picked randomly. If its degree k{i) = 0, nothing hap¬ 
pens. For k(i) > 0, (i) with probability 1 — a, a random 
neighbor j of i is selected and we put node j into the same 
community of i, i.e. set Sc i} Cj = 1; (h) otherwise, with 
probability a, an edge attached to vertex i is selected and 
the other end of this edge is rewired to a randomly chosen 
vertex in the same community with i. This process contin¬ 
ues until no edge connecting individuals between different 
communities. 

This dynamical evolutionary process can be considered 
as a special case of famous Holme-Newman model HU. 
There are two extremes corresponding to the value of a. 
When a = 1, only rewiring steps(step ii) occur. Once all 
of L edges are touched, the graph has been split into C 
components, each consisting of individuals who share the 
same label. Because none of the states have changed, the 
components are small (i.e., their sizes are Poisson distribu¬ 
tion with mean ^). According to classical results for the 
coupon collector’s problem US], l log L updates are ap¬ 
proximately required. In contrast, for a = 0, this system 
reduces to the voter model on a static graph. If we sup¬ 
pose that the initial graph is an ER random graph in which 
each vertex has average degree (k) > 1 , and then there is a 
’’giant component” that contains a positive fraction (k)N 
of the vertices, and the second largest component is small 
having only O log (IV) vertices. The voter model on the 
giant component will reach a consensus in 0(N 2 ) steps. 

To pursue the study, a two state Potts model (the two 
different spin states called 0 and 1 ) is proposed instead 
of a number proportional to the size of the graph. This 
model is also called Ising model. As the same as Holme- 
Newman model, the final fraction u of nodes with the mi¬ 
nority spin states undergoes a discontinuous transition at 
a value a that does not depend on the initial density. Fig[3] 
shows results of simulations for our method starting from 
an initial graph that is ER random graph with N = 10000 
nodes and average degree k = 4. Spin values are initially 
assigned randomly with the probability of state 1 given by 
fraction u = 0.5, 0.25,0.1, and 0.05. The figure shows the 
final fraction u of nodes with the minority spin states from 
five scenarios for each u. Although the fraction of nodes 
with state 1 is less than 0 , this minority state will reach a 
stable state instead of being ’’assimilated” or ’’swallowed” 
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Fig. 3: The change of fraction in minority state with a on 
ER random graphs with N = 10000 nodes and average degree 
k = 4. The fraction in minority state u equals to 0.5, 0.25, 0.1 
and 0.05, respectively. 


by state 0 . In community analysis, this phenomenon is 
equivalent to free of ’’resolution limit” problem pointed 
out by Fortunato et al [ 3 ], where the modularity Q exists 
at an intrinsic scale beyond which small qualified commu¬ 
nities cannot be detected by maximizing the modularity. 

4-3. The computation of ac- Since 7 = analyzed 
above, a is proportional to the value of 7. If ac is de¬ 
fined as the critical threshold of a when li n = l out in C 
community structure, and then estimate that the value of 
7 c is proportional to the critical threshold ac, using the 
relationship 7 c = 1 ^ c • Hereby* we focus on the case 
C = 2 and find the solution which can be extended to 
more general cases. 

For methodology, we use mean-field theory with Markov 
dynamical process to compute ac- First, Let L xy be the 
number of edges of adjacent nodes with states x and y, and 
L xyz is defined as the number of oriented triples x—y—z of 
adjacent sites with states x, y , z, respectively. L 01 is equal 
to L 10, for example, in the 0-1-0 case, all such triples will 
be counted twice, but the approach is limited of dense 
graphs, where the general statistics are the number of ho- 
momorphisms of some small graphs (labeled by ones and 
zeros in our case) into the random graph being studied. 
It is common to use the pair approximation (PA), which 
in essence assumes that the equilibrium state is a Markov 
chain: L100 = AioAoo/wiVo, where Nq is the number of 
vertices in state 0, and to is the clustering coefficient of a 
network. Using mean field theory and algebraic transfor¬ 
mation, the following theorem can be deduced: 

Theorem 3. Defining (k) as the average degree, u as 
the fraction of nodes with minority spin state, N as the 
number of nodes, oj as the clustering coefficient, then, the 
critical threshold ac and the number of inter-community 
edges satisfies ac = 1 — -fjfcr and L m = Nu(l — u)((k) — 
jz^), respectively. 

Proof. The calculations presented here are inspired by 
similar equations in Kimura and Hayakawa mi- Accord¬ 


ing to the mechanism of our model, by considering all of 
the possible changes, the partial differential equations can 
be established as follows: 


IdL 


10 


2 dt 


— ~ 2 Tio + (l— cc) [Aon — A101 + L100 — Aqio], ( 15 ) 


IdL 


11 


2 dt 


and 


IdL 


00 


2 dt 


— £10 + (1 — a)[Lioi — Lon]* (16) 


— A 10 + (1 — <a)[Aoio — Aioo]- (17) 


The fact that this notation is more natural than dividing 
by 2 to eliminate overcounting, can be seen by observing 
that, if k(x) is the degree of node x, there are ■ Ay = 
J2 X k ( x ) and E ,jk Aijfc = k{x)[k(x) - 1]. Also, An + 

2Lio + Aoo = A, where L is the number of edges, and 
the sum of the three differential equations(i.e. Ea. (fl5l) - 
Eq.ED) is zero. 

Taking steady solution of these equations and the pair 
approximation as before, we get 


1 — a 


A 10 — 


Am A 


01-011 


AinAi 


lO-t-'Ol 


uiuN u>( 1 — u)N ’ 


and 


A 10 — 


AinAi 


10-^00 


Am A 


01-^10 


1 — a w(l — u)N ujuN 

Eq. (fTSl) and Eq. (flTfl) lead to the equations 


-'ll 


-U0 


1 


and 


ojuN w(l — u)N 1 — a ’ 

Aoo A10 _ 1 

w(l — u)N umN 1 — a 


(18) 


(19) 


( 20 ) 


( 21 ) 


Adding uN times Ea. (l20l) to (1 — u)N times Ea. (l2lTl . 
we have 


An + Aqo — ( 


1 — u 


+ 


1 — u 
u 


)Aqi 


coN 
1 — a 


( 22 ) 


When L 01 = 0, only intra-community edges exist, we have 
An + Aoo = (k)N. The threshold information can be got: 

ac = l -W (23) 

Using Ea. C3l) and Ln + L 00 = (k)N — 2L 0 i, we have 

[(fc)- ——]N = Lm . . (24) 

LW l-a J u(l-u) V ; 

The number of inter-community edges satisfies 

^ = «(1-«)(<*>(25) 


The proof is completed. 
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Fig. 4: The performance of ac on both GN and LFR net¬ 
work. (a) In GN network, ac increases with the increase of 
k° ut The community structure varies from clear to vague in 
accordance to ac value from 0.3 to 1. (b)In LFR-benchmark, 
the average degree k = 20, and maximum degree is 50 and 
P(k) oc kP. Maximum and minimum community sizes are 50 
and 20 respectively. With the increase of mix parameter 9, the 
ac index increases. 

The approach is limited to dense graphs. As 7 c = 
, a is proportional to the value of 7 , and then we 

can get 7 c = ~~ — 1. This approximation is simple and 
convenient to compute in large scale networks, by using 
sampling technology. Although 7 c is derived from two 
states Ising model, it can be directly applied to network 
with more than 2 communities, since the elements (k) and 
u) are only determined by network topology without using 
any partition algorithm. 

5. Experiments . — We test the index on both the 
classical GN benchmark presented by Girven and Newman 
[2] and the more challenging LRF benchmark proposed by 
Lancichinetti, Fortunato and Radicchi [14] . GN network 
has n = 128 nodes that are divided into 4 communities 
with 32 nodes each. Each node is connected to average 
(k m ) = 16 nodes of its own group and ( k out ) of the rest 
of the network. The total degree of each node is always 
kept constant and equals to k = (k m ) + (k out ). In the 
LFR benchmark, each node is given a degree taken from 
a power law distribution with an positive exponent. Ad¬ 
ditional, each node shares a fraction 8 with other nodes in 
the network, where 8 is the mixing parameter. The clar¬ 
ity of community structure can be adjusted by the mixing 
parameter 8. 

As is well known, the communities become fuzzier and 
thus more difficult to be identified when ( k out ) and 8 in¬ 
crease. Hence, the robustness of the community structure 
will tend to be weaker and the ac index will increase. 
The numerical results of ac value for both { k out ) and 8 
are shown in Fig@l The figure indicates that the index ac 
works well in these networks: when community structure 
is very clear, the ac is near 0.2-0.3; when the network is 
nearly a random one, the corresponding ac is very close to 
1. Thus, this method shows a great ability in characteriz¬ 
ing the properties of modular structure and the lower the 
ac( or 7 c) index is, the more robust community structure 
will be. 


In order to verify our method, it is also applied to three 
famous artificial networks- ER random graph, BA scale- 
free network, and P&S 1 network m where the number 
of nodes are 10,000 and the average degree is all 3. The 
experimental result indicates that P&S model is the most 
robust one. We also test the method on real networks 
and the results are shown and analyzed in Supplementary 
Material m- 

6. Conclusion . — In summary, this letter presents a 
new community analysis method which is able to uncover 
the connection between robustness of community struc¬ 
ture and the critical threshold of resolution parameter 7 c- 
Based on the theoretical analysis, a novel computation 
method is developed to quantify 7 c using co-evolution 
theory. The effectiveness and efficiency are demonstrated 
and verified, which can be applied to broad problems in 
network analysis and data mining due to its solid mathe¬ 
matical basis and efficiency. 
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